Resampling imbalanced data for network intrusion detection datasets

نویسندگان

چکیده

Abstract Machine learning plays an increasingly significant role in the building of Network Intrusion Detection Systems. However, machine models trained with imbalanced cybersecurity data cannot recognize minority data, hence attacks, effectively. One way to address this issue is use resampling, which adjusts ratio between different classes, making more balanced. This research looks at resampling’s influence on performance Artificial Neural multi-class classifiers. The resampling methods, random undersampling, oversampling, undersampling and Synthetic Minority Oversampling Technique, Adaptive Sampling Method were used benchmark Cybersecurity datasets, KDD99, UNSW-NB15, UNSW-NB17 UNSW-NB18. Macro precision, macro recall, F1-score evaluate results. patterns found were: First, oversampling increases training time decreases time; second, if extremely imbalanced, both increase recall significantly; third, not will have much impact; fourth, mostly (attacks) detected.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Generating Real-life Datasets for Network Intrusion Detection

With exponential growth in the number of computer applications and the sizes of networks, the potential damage that can be caused by attacks launched over the Internet keeps increasing dramatically. A number of network intrusion detection methods have been developed with respective strengths and weaknesses. The majority of network intrusion detection research and development is still based on s...

متن کامل

Data Mining for Imbalanced Datasets: An Overview

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs...

متن کامل

Intrusion Detection Using Incremental Learning from Streaming Imbalanced Data

Most of the network habitats retain on facing an ever increasing number of security threats. In early times, firewalls are used as a security examines point in the network environment. Recently the use of Intrusion Detection System (IDS) has greatly increased due to its more constructive and robust working than firewall. An IDS refers to the process of constantly observing the incoming and outg...

متن کامل

Data Mining Methods for Network Intrusion Detection

Network intrusion detection systems have become a standard component in security infrastructures. Unfortunately, current systems are poor at detecting novel attacks without an unacceptable level of false alarms. We propose that the solution to this problem is the application of an ensemble of data mining techniques which can be applied to network connection data in an offline environment, augme...

متن کامل

A Multiple Resampling Method for Learning from Imbalanced Data Sets

Re-Sampling methods are commonly used for dealing with the class-imbalance problem. Their advantage over other methods is that they are external and thus, easily transportable. Although such approaches can be very simple to implement, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2021

ISSN: ['2196-1115']

DOI: https://doi.org/10.1186/s40537-020-00390-x